#HIDDEN
from IPython.core.display import HTML

GLBRC collaboration

Adina, Ashley, Nejc, Shane

  1. MetaG Sequencing and alignment Summaries
    • Inputs:
      • 136 Phred-33 fastq files from samples obtained from washes of Switchgrass and Miscanthus transfered from JGI. Metadata
      • Illumina TruSeq version 3.2 sequencing adapters.
    • Outputs:
      • Adapter Trimming and QC (Trimmomatic) Report
        • The majority of reads are of high quality
      • Host Plant Alignment Report
        • ~80% of the reads at every time point align to the plant assembly. This tapers off toward the end of the season, likely because of the senescence of plant cells.
      • Fungal Alignments Report
        • Low fungal alignment. This may be due to having only a small number of fungal assemblies. The fungal reads in switchgrass peak at during the summer months at 9.1% of the remaining reads.
      • Metagenomic Assembly
        • Based on this overlap, we will go ahead and use the MAG assembly instead of the metagenomic assembly.
        • Method: Reads were trimmed, aligned to the host assembly, and reads with neither pair aligninging were extracted and aligned to the collection of fungal assemblies. All reads with neither pair aligned to the fungal assemblies were extracted again in "cleaned" fastq files. MegaHit was then used with the "--kmin-1pass" and "--presets meta-large" options to generate a metagenomic assembly.
      • MAG Assembly
  2. MetaG Analysis

MetaG Sequencing and Alignment Summaries

Full mapping script

Adapter Trimming and QC (Trimmomatic) Report

What is the abundance and quality of the reads in each sample?

Step 1. Split JGI files in PE files

split-paired-reads.py --gzip -1 \$sample.fastq.pe1.gz -2 \$sample.fastq.pe2.gz \$jgi_combined_fastq

Step 2. Use trimmomatic to QC reads
trimmomatic PE -phred33 \$sample.fastq.pe1.gz \$sample.fastq.pe2.gz \$sample.fastq.pe1.gz \$sample.fastq.se1.gz \$sample.fastq.pe2.gz \$sample.fastq.se2.gz \
   ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10:8:TRUE \
   LEADING:3 \
   TRAILING:3 \
   SLIDINGWINDOW:4:15 \
   MINLEN:36

Go back to the Outline

Or skip to:

#HIDDEN
# <!-- # #HIDDEN
# # tImages = {"MetaG":Image("images/TrimReport_MetaG.png"),"MetaT":Image("images/MetaT_trimmomatic_plot.png")}
# # @interact
# # def trimReport(Source=["MetaG","MetaT"]):return tImages[Source] 
# <img src="images/TrimReport_MetaG.png"></img>
# -->
HTML("html/MetaG_TrimStats.html")

Report generated on 2019-10-27, 12:13 based on data in: /mnt/research/ShadeLab/GLBRC/mapping/metaG/fullAssembly/trimStats


Trimmomatic is a flexible read trimming tool for Illumina NGS data.

Host Plant Alignment Report

When the reads are aligned to plant reference genomes, how many of the reads are plant-aligned reads?

Step 1. Align reads to respective host reference assembly

if [[ \$sample == *"G5"* ]]; then
       SAMFILE=\$sample.SWGRASS.sam
       BAMFILE=\$sample.SWGRASS.bam
       bowtie2 --threads \$THREADS -x \$SWITCHGRASS
             -1 \$sample.fastq.pe1.gz -2 \$sample.fastq.pe2.gz
             -U \$sample.fastq.se12.gz -S \$sample.SWGRASS.sam 2>\$sample.stat
else
       SAMFILE=\$sample.MISCANTS.sam
       BAMFILE=\$sample.MISCANTS.bam
       bowtie2 --threads \$THREADS -x \$MISCANTHUS
             -1 \$sample.fastq.pe1.gz -2 \$sample.fastq.pe2.gz
             -U \$sample.fastq.se12.gz -S \$sample.MISCANTS.sam 2>\$sample.stat
fi

Go back to the Outline

Or go to:
#HIDDEN
HTML(filename="html/BothPerc.html")

Sequencing and alignment Summaries

Fungal Alignment Report

What reads are fungal reads?


Step 1. Extract reads that don't align the plant assembly

# A. Convert sam > bam
samtools view -bS \$SAMFILE > \$BAMFILE

# B. Extract all unmapped reads (-f 4) that don't have a mate mapped (-F 256 i.e. both unmapped)
samtools view -b -f 4 -F 256 \$BAMFILE > \$sample.unmapped.bam

# C. Sort the reads
samtools sort -n \$sample.unmapped.bam -o \$sample.unmapped_sorted.bam

# D. Split the unmapped reads in R1 & R2 (back into paired end)
bedtools bamtofastq -i \$sample.unmapped_sorted.bam -fq \$sample.R1.fastq -fq2 \$sample.R2.fastq

Step 2. Align the reads to the combined fungal assemblies
bowtie2 -x \$FUNGAL -1 \$sample.R1.fastq -2 \$sample.R2.fastq -S \$sample.fungal.sam 2>\$sample.fungal.stat

#HIDDEN
HTML(filename='html/FungalAlign.html')

Annotating the metagenomic assembly

Method: Reads were trimmed, aligned to the host assembly, and reads with neither pair aligninging were extracted and aligned to the collection of fungal assemblies. All reads with neither pair aligned to the fungal assemblies were extracted again in "cleaned" fastq files. MegaHit was then used with the "--kmin-1pass" and "--presets meta-large" options to generate a metagenomic assembly.

The annotations were performed using KEGG's prokaryotic peptides and eukaryotic peptides

Based on the high-overlap, we decided to use the mags instead of the assembly for alignment

MAG Stats

MAGs generated with reads that did not align to the fungal/host assemblies using metabat. Stats were then generated with checkm and the command below, results for that stats are below that. checkm lineage_wf -t 20 -x fa Final.contigs.fa.metabat-bins20 metaBinsStats
#HIDDEN
HTML(filename="html/MAG_Stats.html")

Table of the bins with ≥ 50% completeness

Bin Id Marker lineage Unnamed: 2 # Genomes # Markers # Marker sets 0 1 2 3 4 5+ Completeness Contamination Strain heterogeneity
0 bin.7 o__Burkholderiales (UID4105) 54 553 264 3 534 16 0 0 0 99.66 2.65 43.75
1 bin.39 f__Enterobacteriaceae (UID5054) 223 874 303 38 829 7 0 0 0 99.05 0.70 71.43
2 bin.67 o__Pseudomonadales (UID4488) 185 812 308 35 751 26 0 0 0 97.79 3.53 46.15
3 bin.157 o__Actinomycetales (UID1593) 69 400 198 21 364 14 1 0 0 96.84 4.73 29.41
4 bin.125 o__Actinomycetales (UID1663) 488 310 185 18 276 16 0 0 0 93.96 4.23 68.75
5 bin.164 o__Burkholderiales (UID4000) 193 427 214 44 379 4 0 0 0 92.27 1.64 50.00
6 bin.117 o__Rhizobiales (UID3654) 92 481 319 66 401 14 0 0 0 90.51 3.00 78.57
7 bin.202 o__Cytophagales (UID2936) 47 454 336 82 362 10 0 0 0 89.90 2.68 80.00
8 bin.181 o__Rhizobiales (UID3654) 92 481 319 73 381 27 0 0 0 89.84 5.71 33.33
9 bin.188 o__Burkholderiales (UID4000) 193 427 214 77 296 51 3 0 0 87.56 15.48 60.00
10 bin.198 o__Cytophagales (UID2936) 47 454 336 54 379 21 0 0 0 87.28 3.43 61.90
11 bin.121 o__Burkholderiales (UID4000) 193 427 214 68 339 20 0 0 0 87.25 5.02 30.00
12 bin.95 o__Burkholderiales (UID4002) 107 574 251 67 493 14 0 0 0 87.18 2.41 78.57
13 bin.214 o__Flavobacteriales (UID2815) 123 324 204 67 252 5 0 0 0 87.11 1.49 100.00
14 bin.195 k__Bacteria (UID203) 5449 104 58 25 54 16 8 1 0 82.99 35.50 78.26
15 bin.105 o__Actinomycetales (UID1593) 69 400 198 104 282 14 0 0 0 75.84 4.46 28.57
16 bin.88 k__Bacteria (UID203) 5449 104 58 28 39 22 10 5 0 73.82 38.55 34.15
17 bin.142 c__Deltaproteobacteria (UID3216) 83 247 155 49 194 4 0 0 0 73.12 2.58 25.00
18 bin.148 o__Sphingomonadales (UID3310) 26 569 293 154 398 17 0 0 0 72.70 2.26 17.65
19 bin.112 o__Flavobacteriales (UID2815) 123 324 204 100 214 10 0 0 0 71.14 2.71 80.00
20 bin.91 o__Sphingomonadales (UID3310) 26 569 293 191 261 99 18 0 0 69.76 25.37 22.22
21 bin.206 o__Rickettsiales (UID3809) 83 324 211 86 226 12 0 0 0 69.27 3.82 58.33
22 bin.93 o__Actinomycetales (UID1802) 274 385 212 105 254 25 1 0 0 68.43 8.09 53.57
23 bin.114 o__Actinomycetales (UID1663) 488 310 185 92 215 3 0 0 0 67.03 0.69 33.33
24 bin.104 k__Bacteria (UID203) 5449 103 57 52 38 12 1 0 0 67.01 12.41 46.67
25 bin.134 o__Rhizobiales (UID3654) 92 481 319 183 288 10 0 0 0 66.91 2.00 80.00
26 bin.163 o__Rickettsiales (UID3809) 83 324 211 101 216 7 0 0 0 66.80 2.53 14.29
27 bin.146 o__Actinomycetales (UID1697) 387 330 193 131 190 9 0 0 0 64.35 2.27 0.00
28 bin.205 o__Actinomycetales (UID1697) 387 330 193 125 198 7 0 0 0 63.48 3.11 42.86
29 bin.200 k__Bacteria (UID203) 5449 104 58 59 43 2 0 0 0 62.47 2.59 100.00
30 bin.57 o__Cytophagales (UID2936) 47 454 336 192 256 6 0 0 0 60.73 1.79 50.00
31 bin.10 k__Bacteria (UID203) 5449 104 58 63 41 0 0 0 0 60.69 0.00 0.00
32 bin.208 c__Alphaproteobacteria (UID3305) 564 337 221 134 193 10 0 0 0 60.38 2.96 90.00
33 bin.2 o__Actinomycetales (UID1815) 120 574 266 221 331 22 0 0 0 59.37 3.03 36.36
34 bin.175 o__Burkholderiales (UID4000) 193 427 214 171 246 8 2 0 0 58.90 2.35 50.00
35 bin.116 k__Bacteria (UID203) 5449 104 58 61 31 11 0 1 0 58.79 18.97 100.00
36 bin.66 o__Sphingomonadales (UID3310) 26 569 293 263 287 17 2 0 0 56.39 5.49 52.17
37 bin.139 o__Actinomycetales (UID1593) 69 400 198 165 203 31 0 1 0 56.31 7.46 64.86
38 bin.156 k__Bacteria (UID203) 5449 104 58 60 39 5 0 0 0 56.04 6.19 100.00
39 bin.55 k__Bacteria (UID203) 5449 103 57 65 28 9 1 0 0 56.01 13.16 66.67
40 bin.84 o__Burkholderiales (UID4000) 193 427 214 212 206 9 0 0 0 55.53 2.95 11.11
41 bin.171 k__Bacteria (UID203) 5449 104 58 35 31 24 14 0 0 55.22 22.74 3.03
42 bin.132 f__Rhizobiaceae (UID3564) 78 840 354 352 470 18 0 0 0 52.64 1.16 61.11
43 bin.54 k__Bacteria (UID203) 5449 103 57 69 34 0 0 0 0 52.63 0.00 0.00
44 bin.199 k__Bacteria (UID203) 5449 104 58 65 37 2 0 0 0 52.47 3.45 50.00
45 bin.115 k__Bacteria (UID203) 5449 104 58 43 38 12 11 0 0 52.30 19.17 66.67
46 bin.120 o__Sphingomonadales (UID3310) 26 569 293 274 271 23 1 0 0 51.10 5.80 38.46
47 bin.33 k__Bacteria (UID203) 5449 104 58 68 36 0 0 0 0 50.17 0.00 0.00
48 bin.65 k__Bacteria (UID203) 5449 104 58 64 39 1 0 0 0 49.32 1.72 100.00

MetaG Analysis

PCoA

Is time a factor that affects the functional richness of phylosphere communities (Alpha Diversity)?

Switchgrass Pearson's product-moment correlation
Data: Time and Richness
t = 10.223 df = 62 p-value = 6.316e-15
sample estimates: cor 0.7922484
Switchgrass Pearson's product-moment correlation
Data: Time and Shannon
t = 7.1534 df = 62 p-value = 1.166e-09
sample estimates: cor 0.6724267
Switchgrass Pearson's product-moment correlation
Data: Time and Pielou
t = 5.6951 df = 62 p-value = 3.622e-07
sample estimates: cor 0.586056
Miscanthus Pearson's product-moment correlation
Data: Time and Richness
t = 8.5659 df = 70 p-value = 1.658e-12
sample estimates: cor 0.7153784
Miscanthus Pearson's product-moment correlation
Data: Time and Shannon
t = 6.1267 df = 70 p-value = 4.675e-08
sample estimates: cor 0.5908107
Miscanthus Pearson's product-moment correlation
Data: Time and Pielou
t = 7.1602 df = 70 p-value = 6.369e-10
sample estimates: cor 0.6502079

Do we see changes over time in functional richness?

Call: adonis(formula = dist.otu ~ map_16S\$time_numeric)

Permutation: free Number of permutations: 999

Terms added sequentially (first to last)

Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
map_16S$time_numeric 1 1.9356 1.93562 47.814 0.26298 0.001 ***
Residuals 134 5.4246 0.04048 0.73702
Total 135 7.3603 1.000000

images/Level1_Mags.png

images/Level2_Mags.png

Is there a difference between treatment (Fertilized vs. Unfertilized)?

Call: adonis(formula = dist.otu ~ map_16S\$treatment)

Permutation: free Number of permutations: 999

Terms added sequentially (first to last)

Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
map_16S$treatment 1 0.0289 0.028930 0.52878 0.00393 0.755
Residuals 134 7.3313 0.054711 0.99607
Total 135 7.3603 1.000000

Is there a difference between host plant?

Call: adonis(formula = dist.otu ~ map_16S\$plant)

Permutation: free Number of permutations: 999

Terms added sequentially (first to last)

Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
map_16S$plant 1 0.1617 0.161704 3.0101 0.02197 0.027 *
Residuals 134 7.1986 0.053721 0.97803
Total 135 7.3603 1.000000

Is there similarity between the WGS communities and the 16S data (mantel/procrustes)?

Mantel statistic based on Pearson's product-moment correlation Call: mantel(xdis = genes.dist, ydis = otu.dist) Mantel statistic r: 0.6526 Significance: 0.001 Upper quantiles of permutations (null model): 90% 95% 97.5% 99% 0.0833 0.1194 0.1458 0.1849 Permutation: free Number of permutations: 999 Call: protest(X = genes.pcoa, Y = otu.pcoa) Procrustes Sum of Squares (m12 squared): 0.5402 Correlation in a symmetric Procrustes rotation: 0.6781 Significance: 0.001 Permutation: free Number of permutations: 999